Graph Mining: Repository vs. Canonical Form
نویسندگان
چکیده
In frequent subgraph mining one tries to find all subgraphs that occur with a user-specified minimum frequency in a given graph database. The basic approach is to grow subgraphs, adding an edge and maybe a node in each step, to count the number of database graphs containing them, and to eliminate infrequent subgraphs. The predominant method to avoid redundant search (the same subgraph can be grown in several ways) is to define a canonical form that uniquely identifies a graph up to automorphisms. The obvious alternative, a repository of processed subgraphs, has received fairly little attention yet. However, if the repository is laid out as a hash table with a carefully designed hash function, this approach is competitive with canonical form pruning. In experiments we conducted, the repository-based approach could sometimes outperform canonical form pruning by 15%.
منابع مشابه
On Canonical Forms for Frequent Graph Mining
In approaches to frequent graph mining that are based on growing subgraphs into a set of graphs, one of the core problems is how to avoid redundant search. A powerful technique to overcome this problem is a canonical description of a graph, which uniquely identifies it, and a corresponding test. This paper introduces a family of canonical forms that are based on systematic ways to construct spa...
متن کاملCanonical Forms for Frequent Graph Mining
A core problem of approaches to frequent graph mining, which are based on growing subgraphs into a set of graphs, is how to avoid redundant search. A powerful technique for this is a canonical description of a graph, which uniquely identifies it, and a corresponding test. I introduce a family of canonical forms based on systematic ways to construct spanning trees. I show that the canonical form...
متن کاملReducing the Number of Canonical Form Tests for Frequent Subgraph Mining
Frequent connected subgraph (FCS) mining is an interesting problem with wide applications in real life. Most of the FCS mining algorithms have been focused on detecting duplicate candidates using canonical form tests. Canonical form tests have high computational complexity, and therefore, they affect the efficiency of graph miners. In this paper, we introduce novel properties to reduce the numb...
متن کاملPGR: A Graph Repository of Protein 3D-Structures
Graph theory and graph mining constitute rich fields of computational techniques to study the structures, topologies and properties of graphs. These techniques constitute a good asset in bioinformatics if there exist efficient methods for transforming biological data into graphs. In this paper, we present Protein Graph Repository (PGR), a novel database of protein 3D-structures transformed into...
متن کاملCombining Ring Extensions and Canonical Form Pruning
A common problem in frequent graph mining is the size of the output, which can easily exceed the size of the database to analyze. In the application area of molecular fragment mining a promising approach to tackle this problem is to treat certain substructures as a unit. Among such structures, rings are most prominent, and by requiring that either a ring is present as a whole in a fragment, or ...
متن کامل